Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 5 de 5
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
IEEE Trans Pattern Anal Mach Intell ; 45(6): 7123-7141, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-36417745

RESUMO

Scene text spotting is of great importance to the computer vision community due to its wide variety of applications. Recent methods attempt to introduce linguistic knowledge for challenging recognition rather than pure visual classification. However, how to effectively model the linguistic rules in end-to-end deep networks remains a research challenge. In this paper, we argue that the limited capacity of language models comes from 1) implicit language modeling; 2) unidirectional feature representation; and 3) language model with noise input. Correspondingly, we propose an autonomous, bidirectional and iterative ABINet++ for scene text spotting. First, the autonomous suggests enforcing explicitly language modeling by decoupling the recognizer into vision model and language model and blocking gradient flow between both models. Second, a novel bidirectional cloze network (BCN) as the language model is proposed based on bidirectional feature representation. Third, we propose an execution manner of iterative correction for the language model which can effectively alleviate the impact of noise input. Additionally, based on an ensemble of the iterative predictions, a self-training method is developed which can learn from unlabeled images effectively. Finally, to polish ABINet++ in long text recognition, we propose to aggregate horizontal features by embedding Transformer units inside a U-Net, and design a position and content attention module which integrates character order and content to attend to character features precisely. ABINet++ achieves state-of-the-art performance on both scene text recognition and scene text spotting benchmarks, which consistently demonstrates the superiority of our method in various environments especially on low-quality images. Besides, extensive experiments including in English and Chinese also prove that, a text spotter that incorporates our language modeling method can significantly improve its performance both in accuracy and speed compared with commonly used attention-based recognizers. Code is available at https://github.com/FangShancheng/ABINet-PP.

2.
World Wide Web ; 26(2): 539-559, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-35528264

RESUMO

Developmental dysplasia of the hip (DDH) is one of the most common diseases in children. Due to the experience-requiring medical image analysis work, online automatic diagnosis of DDH has intrigued the researchers. Traditional implementation of online diagnosis faces challenges with reliability and interpretability. In this paper, we establish an online diagnosis tool based on a multi-task hourglass network, which can accurately extract landmarks to detect the extent of hip dislocation and predict the age of the femoral head. Our method utilizes a multi-task hourglass network, which trains an encoder-decoder network to regress the landmarks and predict the developmental age for online DDH diagnosis. With the support of precise image analysis and fast GPU computing, our method can help overcome the shortage of medical resources and enable telehealth for DDH diagnosis. Applying this approach to a dataset of DDH X-ray images, we demonstrate 4.64 mean pixel error of landmark detection compared to the results of human experts. Moreover, we can improve the accuracy of the age prediction of femoral heads to 89%. Our online automatic diagnosis system has provided service to 112 patients, and the results demonstrate the effectiveness of our method.

3.
Nanomaterials (Basel) ; 12(10)2022 May 16.
Artigo em Inglês | MEDLINE | ID: mdl-35630913

RESUMO

The past decades have witnessed surging demand for wearable electronics, for which thermoelectrics (TEs) are considered a promising self-charging technology, as they are capable of converting skin heat into electricity directly. Bi2Te3 is the most-used TE material at room temperature, due to a high zT of ~1. However, it is different to integrate Bi2Te3 for wearable TEs owing to its intrinsic rigidity. Bi2Te3 could be flexible when made thin enough, but this implies a small electrical and thermal load, thus severely restricting the power output. Herein, we developed a Bi2Te3/nickel foam (NiFoam) composite film through solvothermal deposition of Bi2Te3 nanoplates into porous NiFoam. Due to the mesh structure and ductility of Ni Foam, the film, with a thickness of 160 µm, exhibited a high figure of merit for flexibility, 0.016, connoting higher output. Moreover, the film also revealed a high tensile strength of 12.7 ± 0.04 MPa and a maximum elongation rate of 28.8%. In addition, due to the film's high electrical conductivity and enhanced Seebeck coefficient, an outstanding power factor of 850 µW m-1 K-2 was achieved, which is among the highest ever reported. A module fabricated with five such n-type legs integrated electrically in series and thermally in parallel showed an output power of 22.8 nW at a temperature gap of 30 K. This work offered a cost-effective avenue for making highly flexible TE films for power supply of wearable electronics by intercalating TE nanoplates into porous and meshed-structure materials.

4.
IEEE Trans Image Process ; 30: 5848-5861, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34152986

RESUMO

Weakly supervised temporal action detection has better scalability and practicability than fully supervised action detection in reality deployment. However, it is difficult to learn a robust model without temporal action boundary annotations. In this paper, we propose an en-to-end Multi-Scale Structure-Aware Network (MSA-Net) for weakly supervised temporal action detection by exploring both the global structure information of a video and the local structure information of actions. The proposed SA-Net enjoys several merits. First, to localize actions with different durations, each video is encoded into feature representations with different temporal scales. Second, based on the multi-scale feature representation, the proposed model has designed two effective structure modeling mechanisms including global structure modeling and local structure modeling, which can effectively learn discriminative structure aware representations for robust and complete action detection. To the best of our knowledge, this is the first work to fully explore the global and local structure information in a unified deep model for weakly supervised action detection. And extensive experimental results on two benchmark datasets demonstrate that the proposed MSA-Net performs favorably against state-of-the-art methods.

5.
IEEE Trans Med Imaging ; 39(12): 3944-3954, 2020 12.
Artigo em Inglês | MEDLINE | ID: mdl-32746137

RESUMO

Developmental dysplasia of the hip (DDH) is one of the most common orthopedic disorders in infants and young children. Accurately detecting and identifying the misshapen anatomical landmarks plays a crucial role in the diagnosis of DDH. However, the diversity during the calcification and the deformity due to the dislocation lead it a difficult task to detect the misshapen pelvis landmarks for both human expert and computer. Generally, the anatomical landmarks exhibit stable morphological features in part regions and rigid structural features in long ranges, which can be strong identification for the landmarks. In this paper, we investigate the local morphological features and global structural features for the misshapen landmark detection with a novel Pyramid Non-local UNet (PN-UNet). Firstly, we mine the local morphological features with a series of convolutional neural network (CNN) stacks, and convert the detection of a landmark to the segmentation of the landmark's local neighborhood by UNet. Secondly, a non-local module is employed to capture the global structural features with high-level structural knowledge. With the end-to-end and accurate detection of pelvis landmarks, we realize a fully automatic and highly reliable diagnosis of DDH. In addition, a dataset with 10,000 pelvis X-ray images is constructed in our work. It is the first public dataset for diagnosing DDH and has been already released for open research. To the best of our knowledge, this is the first attempt to apply deep learning method in the diagnosis of DDH. Experimental results show that our approach achieves an excellent precision in landmark detection (average point to point error of 0.9286mm) and illness diagnosis over human experts. Project is available at http://imcc.ustc.edu.cn/project/ddh/.


Assuntos
Displasia do Desenvolvimento do Quadril , Criança , Pré-Escolar , Humanos , Lactente , Redes Neurais de Computação , Pelve/diagnóstico por imagem , Radiografia
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...